Data Analysis Project: Semi-Supervised Discovery of Named Entities and Relations from the Web

نویسندگان

Sophie Wang

Tom Mitchell

چکیده

This project studies semi-supervised discovery of named entities, relational entities and prepositional phrase attachments within a read-the-web framework. Meanings of an entity can be improvised and updated faster in the internet world than printed references. The main idea of this project is to study the feasibility of characterizing entities by web content directly. The approach is that contextual words around an entity on web pages are first extracted and converted into a Bag-Of-Word (BOW) representation. We then apply several supervised and semi-supervised learning methods on top of these contextual words for several well known research problems: Named Entities Recognition, Relation Extraction and Prepositional Phrase Attachment.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Extracting Arabic Relations from the Web

There is a vast amount of unstructured Arabic information on the Web, this data is always organized in semi-structured text and cannot be used directly. This research proposes a semi-supervised technique that extracts binary relations between two Arabic named entities from the Web. Several works have been performed for relation extraction from Latin texts and as far as we know, there isn’t any ...

متن کامل

Learning on Partial-Order Hypergraphs

Graph-based learning methods explicitly consider the relations between two entities (i.e., vertices) for learning the prediction function. They have been widely used in semi-supervised learning, manifold ranking, and clustering, among other tasks. Enhancing the expressiveness of simple graphs, hypergraphs formulate an edge as a link to multiple vertices, so as to model the higher-order relation...

متن کامل

Using Corpus Statistics on Entities to Improve Semi-supervised Relation Extraction from the Web

Many errors produced by unsupervised and semi-supervised relation extraction (RE) systems occur because of wrong recognition of entities that participate in the relations. This is especially true for systems that do not use separate named-entity recognition components, instead relying on general-purpose shallow parsing. Such systems have greater applicability, because they are able to extract r...

متن کامل

A New Method for Improving Computational Cost of Open Information Extraction Systems Using Log-Linear Model

Information extraction (IE) is a process of automatically providing a structured representation from an unstructured or semi-structured text. It is a long-standing challenge in natural language processing (NLP) which has been intensified by the increased volume of information and heterogeneity, and non-structured form of it. One of the core information extraction tasks is relation extraction wh...

متن کامل

Semi-supervised Statistical Inference for Business Entities Extraction and Business Relations Discovery

The sheer volume of user-contributed data on the Internet has motivated organizations to explore the collective business intelligence (BI) for improving business decisions making. One common problem for BI extraction is to accurately identify the entities being referred to in user-contributed comments. Although named entity recognition (NER) tools are available to identify basic entities in tex...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2009

Data Analysis Project: Semi-Supervised Discovery of Named Entities and Relations from the Web

نویسندگان

چکیده

منابع مشابه

Extracting Arabic Relations from the Web

Learning on Partial-Order Hypergraphs

Using Corpus Statistics on Entities to Improve Semi-supervised Relation Extraction from the Web

A New Method for Improving Computational Cost of Open Information Extraction Systems Using Log-Linear Model

Semi-supervised Statistical Inference for Business Entities Extraction and Business Relations Discovery

عنوان ژورنال:

اشتراک گذاری